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Abstract — We introduce a new divergence measure, the 
bounded Bhattacharyya distance (BBD), for quantifying the 
dissimilarity between probability distributions. BBD is based 
on the Bhattacharyya coefficient (fidelity) , and is symmetric, 
positive semi-definite, and bounded. Unlike the Kullback-Leibler 
divergence, BBD does not require probability density functions 
to be absolutely continuous with respect to each other. We show 
that BBD belongs to the class of Csiszar f-divergence and derive 
certain relationships between BBD and well known measures 
such as Bhattacharyya, Hellinger and Jensen-Shannon diver- 
gence. Bounds on the Bayesian error probability are established 
with BBD measure. We show that the curvature of BBD in 
the parameter space of families of distributions is proportional 
to the Fisher information. For distributions with vector valued 
parameters, the curvature matrix can be used to obtain the 
Rao geodesic distance between them. We also discuss a potential 
application of probability distance measures in model selection. 

Index Terms — Signal detection, Bhattacharyya distance, diver- 
gence, dissimilarity measure, f-divergence, error probability. 

I. Introduction 

Divergence measures for the distance between two proba- 
bility distributions have been extensively studied in the last 
six decades (TJ, 12, El, 0, 0. These measures are widely 
used in varied fields such as pattern recognition (6), Q, (8), 
signal detection [9|, |10|, Bayesian model validation flTTI and 
quantum information theory lfT2l . |[l"3l . Distance measures 
try to achieve two main objectives (which are not mutually 
exclusive): to assess (1) how "close" two distributions are 
compared to others and (2) how "easy" it is to distinguish 
between one pair than the other Q. 

There is a plethora of distance measures available to assess 
the convergence (or divergence) of probability distributions. 
Many of these measures are not metrics in the strict sense, 
as they may not satisfy either the symmetry of arguments or 
triangle inequality. In applications, the choice of the measure 
depends on the interpretation of the metric in terms of the 
problem considered, its analytical properties and ease of 
computation |[T4l . One of the most well-known divergence 
measures is the Kullback-Leibler divergence (KLD)[lj, J4). 
Although it is widely used, KLD can create problems in 
specific applications. Specifically, it is unbounded above and 
requires that the distributions be 'absolutely continuous' with 
respect to each other. Various other information theoretic 
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measures have been introduced keeping in view ease of 
computation and utility in problems of signal selection and 
pattern recognition. Of these measures, Bhattacharyya dis- 
tance 03), 0, Q6) and Chernoff distance 1X71. iRH. Ifl8ll 
have been widely used in signal processing. However, these 
measures are again unbounded from above. Many bounded 
divergence measures such as Variational, Hellinger distance 
@, OH and Jensen-Shannon metric |20), ED, (22| have 
been studied extensively. However, utility of these measures 
varies depending on properties such as tightness of bounds 
on error probabilities, information theoretic interpretation, and 
generalization to multiple probability distributions. Here we 
introduce a bounded measure based on the Bhattacharyya 
coefficient which shares a close relationship with Hellinger 
and Jensen-Shannon divergences. Our bounded Bhattacharyya 
distance (BBD) measure belongs to the class of f-divergences 
and thus inherits all its general properties. We prove an 
extension of Bradt-Karlin theorem for BBD which shows the 
existence of prior probabilities for which ranking of divergence 
is mapped to the ranking of Bayes error probabilities. Based 
on Bhattacharyya coefficient we also show upper and lower 
bounds on the error probabilities. We show that BBD can 
be expressed as f-divergence measure. For many applications 
in Biology, more than two probability measures have to be 
distinguished from each other. Following Rao [21] and Lin 
ll22l we introduce a generalized BBD measure for a generic 
set of probability distributions. We also propose a potential 
application of distance measure to model selection in this 
work. 

Our paper is organized as follows: Section I is the cur- 
rent introduction. In Section II, we discuss the well known 
divergence measures Kullback-Leibler and Bhattacharyya and 
introduce our bounded measure. In Section III, we derive 
several interesting properties of our measure such as positive 
semi-definiteness, relation with Hellinger and Jensen-Shannon 
metric and utility for computing probability of error. Gener- 
alized BBD measures are discussed in Section IV. In Section 
V, the relation with Fisher information and Rao differential 
metric is derived. In the final section, we propose a method 
for using probability distance measures for goodness of fit and 
model selection. In the Appendix we provide the expressions 
for BBD measures for some commonly used distributions. 

II. Divergence measures 

In the following subsection we consider a measurable space 
57 with a algebra B and the set of all probability measures M. 
on (CI, B). Let P and Q denote probability measures on (fi, B) 
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and p and q denote their densities with respect to a common 
measure p We recall the definition of absolute continuity [Cite 
textbook source, and use consistent notations ]: 

Absolute Continuity A measure P on the Borel subsets of 
the real line is absolutely continuous with respect to Lebesgue 
measure Q, if P(A) = 0, for every Borel subset A e B for 
which Q(A) = 0, and is denoted by P « Q. 

A. Kullback-Leibler divergence 

The Kullback-Leibler divergence (KLD) (or relative en- 
tropy) Q), ID between two distributions P, Q with densities 
p(x) and q(x) is given by: 

P(x) ' 
q(x) 

Is 

{I(P,Q) + I(Q,P))/2 [9], I{P,Q) e [0,oo]. It diverges if 
3 x : q(xo) — and p(x Q ) ^ 0. 



I(P,Q)= P {x)\og 



dx. 



The symmetrized version is given by I S ymm{P;Q) 



(1) 



KLD is defined only when P is absolutely continuous w.r.t. 
Q. This feature can be problematic in numerical computations 
when the measured distribution has zero values. 

B. Bhattacharyya Distance 

Bhattacharyya distance is a widely used measure in signal 
selection and pattern recognition [9|. It is defined as: 



B(P,Q) = -\n^J ^p(x)q(x)dx^ = - ln(p), 



(2) 



where the term in parenthesis p(P, Q) = J ^ p(x)q(x)dx is 
called Bhattacharyya coefficient 1231 . ITT31 in pattern recog- 
nition, affinity in theoretical statistics, and fidelity in quan- 
tum information theory. Unlike in the case of KLD, the 
Bhattacharyya distance avoids the requirement of absolute 
continuity. Its a special case of Chernoff distance 



C a {P,Q) 



In 



p a (x)q 1 - a (x)dx 



with a = 1/2. For discrete probability distributions, p € [0, 1] 
is interpreted as a scalar product of the probability vectors 

P = (VPl> V^2j • • • > \/Pn) and Q = (v^l'V^'-'-'V^n)- 
Bhattacharyya distance is symmetric, positive-semidefinite, 

and unbounded (0 < B < oo). It is finite as long as there exists 

some region S C X such that whenever x £ S : p(x)q(x) ^ 0. 

C. Bounded Bhattacharyya distance measure 

In many applications, in addition to the desirable properties 
of the Bhattacharyya distance, boundedness is required. We 
propose a new bounded measure of Bhattacharyya distance as 
below, 



C(P,Q) = -log 2 



1 + J yj p{x)q{x)dx 



log 2 



1 



(3) 

With the choice of base 2 for the logarithm, we normalize the 
maximum value to 1. For convenience, we shall also refer to 




Fig. 1. Comparison of Hellinger and bounded Bhattacharyya distance 
measures C, and £. 



the bounded Bhattacharyya measure as the zeta measure, and 
use the terms BBD and £ interchangeably. The term under 
the logarithm in Eq. [3] is the average of complete and actual 
overlaps of two distributions. From this measure, we can form 
another closely related measure by taking the square root of 
the C: 

£(P,Q) = y/C(P,Q). (4) 

This measure is also symmetric, positive definite and bounded 
between [0,1]. We will not delve into the properties of £ 
measure in this paper. 

Another widely used bounded measure based on the Bhat- 
tacharyya distance is the Hellinger distance 0, |9|, ll24l . Il25ll 



H(P,Q) = y/l-p(P,Q). 



(5) 



We note that H E [0, 1] and is concave in p (dpH = 
— 1 /2(i-p) 3/2 < 0). In contrast our £ measure is convex in 
p (dp( — V(i+p) 2 iog2 > 0), where as £ is neither concave 
nor convex. A comparison between Hellinger and our £ and £ 
measures as function of Bhattacharyya coefficient is given in 
Fig. 1. 

III. Properties of £ and £ measures 

Theorem III.l (Positive semi-definite). £ measure is symmet- 
ric, positive semi-definite and bounded in the interval [0, 1]. 



Proof: By using arithmetic and geometric means inequal- 
ity, we obtain the following: 



dx 



(p(x) + q(x)) dx = 1. (6) 



Which leads to : 

< -log 2 
and hence 



1 + J y / p(x)q(x)dx 



0<£(P,Q) < 1. 



< 1. (7) 



(8) 
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Special cases: 

!0 if P = Q almost everywhere 
1 if p(x)q(x) = for Vx e X (9) 
(0, 1) otherwise 

£ is symmetric by inspection: 



C(P,Q)=C(Q,P). 



(10) 



Error probability: The optimal Bayes error probabilities 
(see eg: J7J, 11261 ) for classifying two events P\ 1 P 2 with den- 
sities pi(x) and P2{x) with prior probabilities V = {tti,tt 2 } 
is given by 



Pp. = 



[ttxP! (x),ir 2 p 2 (x)]dx. 



Let Pi(x) {i = 1,2) be parameterized by a or 0. In signal 
detection literature, a signal set a is considered better than set 
for the densities Pi(x) (i = 1, 2), when the error probability 
is less for a than for |9"1. 

We can also rank the parameters by means of some diver- 
gence D. That is, we can say the set a is better (in the diver- 
gence say) than set 0, if D a (P\, P 2 ) is larger than Dp(Pi, P2). 
In general, it is not true that if D a (P 1 ,P 2 ) > D I3 (P 1 ,P 2 ), 
then P e (a) < P e (0)- Bradt and Karlin proved the following 
theorem relating error probabilities and symmetric KLD: 

Theorem III.2 (Bradt and Karlin [27 1). // J a (Pi,P 2 ) > 
Jf}{P\,P 2 ), then 3 a set of prior probabilities T = {tti,tt 2 } 
for two hypothesis g\ , g 2 , for which 



P e (a,T) <P e (P,T) 



(12) 



where P e (a, T) is the error probability with parameter a and 
prior probability T. 

It is clear that the theorem asserts existence, but no method 
of finding these prior probabilities. Kailath (9) proved the 
applicability of Bradt-Karlin Theorem for Bhattacharyya dis- 
tance measure. We follow the same route and show that the 
^-measure satisfies a similar property using the following 
theorem by Blackwell. 

Theorem III.3 (Blackwell fl28)). Pe(/3,T) < P e (a,T) for all 

prior probabilities T if and only if 

Ep[9(L p )\g^]<K a [9(L a )\g^], 

V continuous concave functions &(L), where L u = 
Pi(x,lu)/p 2 (x,lu) is the likelihood ratio with uj = {a,0} 
and E w [$(L w )|g( 2 )] is the expectation of <$>(L U ) under the 
hypothesis g^ 2 \ 

Theorem III.4. If ((a) > Q(0), or equivalently p(a) < p(0) 
then 3 a set of prior probabilities T — {tti,^} for two 
hypothesis g\ , g 2 , for which 



P e (a,T) < P e (0,T). 



(13) 



Proof: The proof closely follows Kailath [9]. First note 
that vZ is a concave function of L (likelihood ratio) , and 



p{a) 



^2 \/pi{x,a)p 2 (x,a) 
Pi(x,a) 



x£X 



E 




p 2 {x,a) 



p 2 {x,a) 



EJ^\g (2) }. 



Similarly 



p(0)=E p {Sh\gW) 
Hence, p{a) < p{0) => 

E a [^T a \g^] <E p {^T p \g(% 



(14) 
(15) 

(16) 



(11) Suppose assertion of the stated theorem is not true, then 



for all T, P e (0,T) < P e {a,T). Then by Theorem pO 
Ep{<S>(L p )\g^\ < E Q [$(L Q )| 5 ( 2 '] which contradicts our 
result in Eq. [16] ■ 

Theorem HL5. < C(P, Q) < Z(P, Q) < H(P, Q). 

Proof: We have already shown that < £ < 1 in Theorem 
" Since £ = it follows that < ((P, Q) < £(P, Q). 



III.l 



We use the generalized Bernoulli inequality 

(1 + x) r < 1 + rx, < r < 1, x>-l. 



(17) 



Set x = 1 and r = p = j yjp(x)q(x)dx, the Bhattacharyya 
coefficient. Hence 



Hence we get 



< (1 + iy = 2 p < 1 + P < 2 

<p<log 2 (l + p) < 1 



> > yi - iog 2 (i + p ) > 



1 > H > £ > 



Theorem III.6. 



(18) 

(19) 
■ 

(20) 



6 < H < Vln4£ 
where 1 and \/ln4 are sharp. 

Proof: Sharp lower bound has been proved in The- 



orem 



su P P e[o,i) Jffi 
f(p) 

g(p) 



Define 



III . 5 1 Sharpest upper bound is achieved via taking 
e 

H{p) 



(21) 



C(P) V-log 2 (l + p)/2' 
= f{p)- (22) 



We note that g(p) is continuous and has no singularities 
whenever p <E [0, 1). Hence 



g'(p) 



i-p 



ln 2 ^ 



ln2 > 0. 
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It follows that if f(p) is non-decreasing and hence 

su P P e[o : i) g(p) = limp_n g(p) = ln(4). Thus 



Hence 



f(p)< sup «,(p) = V£(4). 

Vpe[o,i) 



< Vln4. 



(23) 



(24) 



Jensen-Shannon Divergence: The Jensen difference be- 
tween two distributions Pi, P2, with densities p\ (x) , p 2 (x) and 
weights (Ai, A2); Ai + A 2 = 1, is defined as, 

Jx 1 ,x 2 (PuP2) = H{X lPl + X2P2) - AiP(pi) - X 2 H(p 2 ). 

(25) 

Jensen-Shannon divergence (JSD) |20], [2fl, ll22ll is based on 
the Jensen difference and is given by: 



JS(P,Q) = J 1/2 , 1/2 (P,Q) 
+q(x) log 



2p(x) 



p(x) + q(x) 
2q(x) 



p(x) + q(x) 



dx. (26) 



The structure and goals of JSD and BBD measures are 
similar. The following theorem compares the two metrics using 
Jensen's inequality. 

Lemma III.7. Jensen's Inequality: For a convex function ip, 

e[v(aq] > mm 

Theorem III.8 (Relation to Jensen-Shannon measure). 

JS(P,Q)> e§2C(^<9)- log 2 

We use the un-symmetrized Jensen-Shannon metric for the 
proof. 

Proof: 

JS(P,Q) = 



p(x) log 



2p(x) 



p(x) + q(x) 



-dx 



> 



V{x) log ^ =^ dx 

V 2 P( X ) 

( since ^/pJxY+q{xj < y/p{x) + \J q{x)) 



-2 log 
> -21ogE P 



(by Jensen's inequality 

E[- log /(*)]>- log E[/(X)]) 

[ ( VpJx) + Vqjxj 



dx 



-21og( 1 + ^») ^ log2 

2{ qp^),Kx))\_ 
log 2 ; 8 

-C(P,Q)-lo g 2. 



log 2 

Therefore we have the result, 



JS(P,Q)> — C(^,<3)-log2. 
log 2 



(27) 



(28) 



A. Bounds on Error Probability 

Error probabilities are hard to calculate in general. Tight 
bounds on P e are often extremely useful in practice. Kailath 
|9| has shown bounds on P e in terms of the Bhattacharyya 
coefficient p: 



1 



27Tl - y/l— 47Tl7r 2 /9 2 



< Pe < Ul ~ 



with 7Ti + 7T2 = 1. If the priors are equal i\\ 
expression simplifies to 



1-^ 



+ V 7r l 7r 2j0, 

(29) 

7r 2 = Y2, the 
(30) 



Substituting p = 2 1 ~ < = — 1, we can get the bounds in terms 
of our £ measure. For the equal prior probabilities case, 
Bhattacharyya coefficient gives a tight upper bound for large 
systems when p — > (zero overlap) and the observations are 
independent and identically distributed. These bounds are also 
useful to discriminate between two processes with arbitrarily 
low error probability [9]. 

B. f-divergence 

A class of divergence measures called f-divergences were 
introduced by Csiszar [29|, |30| and independently by Ali and 
Silvey J2) (see J6) for review). It encompasses many well 
known divergence measures including KLD, variational and 
Bhattacharyya distance. In this section, we show that our £ 
measure belongs to the class of f-divergences. 

f-divergence |6| Consider a measurable space 57 with a alge- 
bra B. Let A be a measure on (O, B) such that any probability 
laws Pi and P2 are absolutely continuous with respect to A, 
with densities p±, and p 2 , Let / be a continuous convex real 
function on M + , and let g be an increasing function on E. The 
class of divergence coefficients between two probabilities: 



d(P 1 ,P 2 )=g 



(31) 



are called the f-divergence measure w.r.t. functions (f,g). 
Here p 2 /pi — L is the likelihood ratio and Ei is the 
expectation w.r.t. to Pi. 

The C(Pi,P2) metric can be written as the following / 
divergence: 

f(x) = ~](l + Vx') 2 , g(F) = -log 2 (-F), (32) 
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where, 



F 



1 1 Pi 
Pi 



1 (t ■ IP 2 



dX 



\ J iVp^ + Vp^) 2dX 

2 



(33) 



and 



7 (F) = -log 2 (- J F) = -log 2 



1+P 



= C(Pi,P 2 ). (34) 



C. Generalized £ measure 

In decision problems involving more than two classes it is 
very useful to have divergence measures involving more than 
two distributions I3T1 . Il22ll . We use the generalized geometric 
mean (GGM) concept to define £ divergence for more than 
two distributions. The GGM of n variables pi,P2, ■ ■ ■ ,p n with 
weights ai,ct2, . ■ ■ ,a n , such that on > 0, J2i a i = 1' i s 
given by 



GGM({p t }) = Y[pt 



The generalized version of £ measure for n probability mea- 
sures Pi, P 2 , . . . , P n can be defined as: 



Cai,-,a„ = - log 



l + hY\UP?d\ 



(35) 



where a, > 0, Y17=i a i = 1- We note mat ' 

since the weighted geometric mean is maximized when all 
the p^s are the same, and is minimized when any two of the 
probability densities p^'s are perpendicular to each other. 

IV. Curvature of ( metric 

In statistics, the information that an observable random 
variable X carries about an unknown parameter 9 (on which 
it depends) is given by the Fisher information. One of the 
important properties of f-divergence of two distributions of 
the same parametric family is that their curvature measures 
the Fisher information. Following the approach pioneered by 
Rao [32 1, in this section, we relate the curvature of £ measures 
to the Fisher information and derive the differential curvature 
metric. The following discussions closely follow DasGupta 

ED. 

Definition Let {f(x\9); 9 e Q C E}, be a family of densities 
indexed by real parameter 9, with some regularity conditions 
(f(x\9) is absolutely continuous). 

2 



C(M) = -log 2 



(36) 



where p{6, <f>) = J ^ f(x\6)f(x\<f>)dx 



Theorem IV. 1. Curvature of Zg{<j))\^ = 8 is the Fisher infor- 
mation of f{x\9) up to a multiplicative constant. 

Proof: Expand Zg(<j)) around theta 



Z e {<j>) = Z (6) + (0- 0)— Zg{<j>) 

U-9f d 2 
+ o 1J? z sW 



+ 



(37) 



Let us observe some properties of Bhattacharyya coefficient 

p(M) 



p(M) = p{M) 

p{9,9) = 1. 



(38) 



and its derivatives: 

dp(9,<P) 

d(j> 
d 2 P (9,<p) 



f(x\9)dx = 0, 



1 



df 



236 

T J f(.r\h) \ 

-1 
T 
-i 



dx 



f{x\6)dx 



06 



IA6). 



(39) 



where If (9) is the Fisher Information of distribution f(x\9) 



(40) 



Using the above relationships, we can write down the terms 



in the expansion of Eq. 37 (neglecting log 2 factor for brevity) 



Z e (9) = 1 



dZe(cj)) 



-1 dp 



d 2 ZeW 



1 + pd({> 
1 



dp 



= 

2 



1 d 2 P 



d 2 Z 6 {4>) 



d<t> 2 







(1 + p) 2 \d<pj l + pd(j) 2 
1 -lf(0) 



l + P (9,9) 



(41) 



We can relate the ( metric to the Fisher Information as 
(converting back to base 2 units) 

(j, a\2 



Ze{4>) 



1 



Hog 2 



I f (9) + .. 



(42) 



A. Differential Metrics 

Rao 1 33 1 generalized the Fisher information to multivariate 
densities with vector valued parameters to obtain a "geodesic" 
distance between two parametric distributions Pg , P^ of the 
same family (see 15.4.2 in A DasGupta |fl9l for details). We 
derive such a metric for £ measure. 
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Let 9, <b e 6 C 



then using the fact that 



V. Model selection 



dZ{9,4>) 



89, 



= 0, 



(43) 



we can easily show that 



dZg = 



E 

*>j=i 

V 



= g l jd9 i d9 j 



(44) 



The curvature metric <jty can be used to find the geodesic on 
the curve rj(t), t € [0, 1] with 



C = r,(t) : 77(0) = 9 r,(l) = cb, 



(45) 



The geodesic distance between 9 and cb can be obtained by 
minimizing the length 



(46) 



with the constraints in Eq. |45] The geodesic equation to be 
solved (assuming summation convention) are: 



dt 2 



djudiy _ 
l0k dt dt 



(47) 



where the Christoffel tensor T^k is given by: 

9g 3 k . 8g k 



r 



ijk 



89, 



89, 



dgij 
89 k 



For our £(#, <fi) metric, one can easily show (following steps 



in Theorem |IV. 1 1 the following result (neglecting log 2 factor 
for brevity): 



8( 



8 2 C 



1 dp 



l + pd<,_ 
1 dp dp 



= o. 



i d 2 P 

1 + p d(f>idcj) 



8f{x\9) df(x\9) 



89, 



Model selection and parameter estimation of models of 
observed data are common problems in statistics, social sci- 
ence, life science and engineering. Many techniques exists for 
parameter estimation such as mean square error, maximum 
likelihood estimators, and Bayes estimators. We propose a 
method which can be potentially used to estimate parameters 
by minimizing the distance between the observed distribution 
and model distribution. 

Let P(X) be the observed distribution. Suppose we have 
n hypothesized distributions {HAa\,X) : j = 1,2, ■■■,n}, 
where a,j : I = 1 , • • • , rrij are the parameters of distribution 
Hj(X). We compute the ( distance between Hj and P, 



C(P,H j ) = -log 2 



and minimize w.r.t the parameters: 

dC(PH 3 ) 



8c 



= 0, 1 = 1,. 



(51) 



(52) 



(48) 



This is just a numerical factor times the Fisher information 
metric for KLD. This is due to the fact that curvature metric of 
all Csiszar f-divergences are just scalar multiple KLD measure 
ED, 0: 



9lM = f"(l)9iM- 



(49) 



For our £ metric 



/"(*) 



/"(I) 



"(I 



8x 3 / 2 
= 1/8. 



(50) 



Hence the result. It follows that the geodesic distance for our 
metric is same KLD geodesic distance up to a multiplicative 
factor. KLD geodesic distances are tabulated in DasGupta 1 19 1. 



and get the estimated function Hj(a l j,k). We need to ensure 
that this differential equation gives us the minima rather than 
the maxima of ( distance. We can then choose the distributions 
with the minimal £ distance as the best fit. Instead, if the 
model parameters are estimated through other means such as 
maximum likelihood or Bayes estimators, we need additional 
criterion such as Akaike information criterion (AIC) to test 
the goodness of fit of models. Here we propose that choosing 
model with minimal distance between observed and model dis- 
tribution can serve as an additional criterion. AIC provides an 
asymptotically unbiased estimator of KLD separation between 
data and model [34|. It works well when the sample size is 
large and the number of parameters is comparatively small. 
KLD is just one measure to assess the disparity between true 
and candidate model. To the best of our knowledge, we do 
not find any studies using other measures of disparity such 
as Hellinger, Bhattacharyya or Jensen-Shannon divergence in 
AIC type criteria. Here we propose to use these measures 
along with BBD to assess the disparity between data and 
model. We believe that these measures might shed additional 
light into parameter estimation and address limitations of KLD 
based AIC. We have not tested our proposed method on real 
data. We leave to future studies to test the robustness of such 
an approach to model selection and parameter estimation. 

VI. Conclusion 

In this work we have introduced a new bounded diver- 
gence measure based on Bhattacharyya distance. It belongs 
to the class of f-divergences and shares all its characteris- 
tics. Although many bounded divergence measures have been 
studied and used in various applications, no single 'metric' 
is useful in all types of problems studied. Ours is based 
on the Bhattacharyya coefficient which is useful in comput- 
ing tight bounds on Bayes error probabilities. The Bounded 
Bhattacharyya distance shares many common properties with 
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Hellinger, Bhattacharyya and Jensen-Shannon divergence mea- 
sures and we have provided several inequalities relating them. 
We have also proposed a new method for parameter estimation 
for probability distributions based on divergence measures. We 
are investigating further, properties of the BBD measure and 
plan to apply it to network models and population dynamics. 
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VIII. Appendix 
A. £ measures of some common distributions 
• Binomial : 

p(k) = Q P k (i - P ) n -\ Q(k) = (2)q k ^-q) n - k - 
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Gaussian : 

P(x) = ' 
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exp 
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exp 
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Exponential : P(x) = \ p e~ x p x , Q(x) = \ q e~ x " x 



Pareto : Assuming the same cut off x m , 
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